fix(dflash): derive n_target_layers fallback in gguf_draft_loader by javierpazo · Pull Request #138 · Luce-Org/lucebox-hub

javierpazo · 2026-05-09T12:13:48Z

fix(dflash): derive n_target_layers fallback in gguf_draft_loader

Follow-up to merged #79 ("read model params from GGUF at runtime,
support any qwen35 size"). #79 covers the target loader and the
common drafter fields, but the fallback chain in gguf_draft_loader
still requires the legacy dflash.n_target_layers key to be
present.

Drafters published with the new metadata key naming
(dflash-draft.dflash.target_layer_ids plus
n_target_features) hit the path where the legacy key is missing
and the loader fails. Concrete case: the published Q8 GGUF drafter
for Qwen3.6-27B-DFlash.

This change derives n_target_layers in two steps:

If target_layer_ids is present, use its length.
Otherwise, if n_target_features and n_embd are both
present, use n_target_features / n_embd (with a sanity
check that the division is exact).

If neither is available, the loader still fails with the same
honest error as before. The legacy key path is untouched.

Validation (RTX 6000 Ada sm_89, Qwen3.6-27B Heretic Q4_K_M target,
Q8 GGUF drafter via the new metadata):

Loaded SWA layers: 4/5, decode 21.06 tok/s, no fallback chain
errors during init.

Verification vs existing community PRs:

COMP-COMPL with #79 (merged 2026-05-03). #79 covered target
loader and drafter fields generically. This PR is a small
follow-up for the case where only the new metadata is present
on the drafter side.

Author: Javier Pazo xabicasa@gmail.com

Follow-up to merged Luce-Org#79 ("read model params from GGUF at runtime, support any qwen35 size"). Luce-Org#79 covers the target loader and the common drafter fields, but the fallback chain in gguf_draft_loader still requires the legacy `dflash.n_target_layers` key to be present. Drafters published with the new metadata key naming (`dflash-draft.dflash.target_layer_ids` plus `n_target_features`) hit the path where the legacy key is missing and the loader fails. Concrete case: the published Q8 GGUF drafter for Qwen3.6-27B-DFlash. This change derives `n_target_layers` in two steps: 1. If `target_layer_ids` is present, use its length. 2. Otherwise, if `n_target_features` and `n_embd` are both present, use `n_target_features / n_embd` (with a sanity check that the division is exact). If neither is available, the loader still fails with the same honest error as before. The legacy key path is untouched. Validation (RTX 6000 Ada sm_89, Qwen3.6-27B Heretic Q4_K_M target, Q8 GGUF drafter via the new metadata): Loaded `SWA layers: 4/5`, decode 21.06 tok/s, no fallback chain errors during init. Verification vs existing community PRs: COMP-COMPL with Luce-Org#79 (merged 2026-05-03). Luce-Org#79 covered target loader and drafter fields generically. This PR is a small follow-up for the case where only the new metadata is present on the drafter side. Author: Javier Pazo <xabicasa@gmail.com>

cubic-dev-ai

No issues found across 1 file

…draft-loader-target-layers

…e-Org#119/Luce-Org#149 reorg Brings in HIP/Strix Halo backend (PRs Luce-Org#119, Luce-Org#149), dflash source-layout reorg (Luce-Org#138 — qwen35/, draft/, qwen3/ subdirs), GGUF draft loader fixes, daemon ubatch defaults, prefix cache + streaming tool-call fixes. Conflicts resolved: - dflash/CMakeLists.txt: take main's reorganized source paths; keep our gemma4_*.cpp entries; preserve the DFLASH27B_MIN_SM backwards- compat shim so gemma4_dflash_graph.cpp:621 keeps building under main's renamed _dflash27b_cuda_min_sm variable. - dflash/deps/llama.cpp: keep our submodule pointer (eb3676f40 on feature/tq3-kv-cache-clean). Main's c79573c9b lacks the TQ3 dispatcher fixes required for Gemma4 KV correctness; if useful upstream commits land there, they should be cherry-picked into our submodule branch separately. Verified: TQ3 64K MTP gamma=2 pflash post-merge: decode 10.58 tok/s, prefill 463 tok/s, accept 0.78 — matches pre-merge baseline (10.25 / 445 / 0.78) within noise.

cubic-dev-ai Bot reviewed May 9, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into xabicasa/dflash-gguf-…

da04f6f

…draft-loader-target-layers

davide221 merged commit 94f15b4 into Luce-Org:main May 12, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dflash): derive n_target_layers fallback in gguf_draft_loader#138

fix(dflash): derive n_target_layers fallback in gguf_draft_loader#138
davide221 merged 2 commits into
Luce-Org:mainfrom
javierpazo:xabicasa/dflash-gguf-draft-loader-target-layers

javierpazo commented May 9, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

javierpazo commented May 9, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants